首页> 外文OA文献 >Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels
【2h】

Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels

机译:通过具有加权度内核的两轮支持向量回归来建模DNA亲和力景观

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Background: A quantitative understanding of interactions between transcription factors (TFs) and their DNA binding sites is key to the rational design of gene regulatory networks. Recent advances in high-throughput technologies have enabled high-resolution measurements of protein-DNA binding affinity. Importantly, such experiments revealed the complex nature of TF-DNA interactions, whereby the effects of nucleotide changes on the binding affinity were observed to be context dependent. A systematic method to give high-quality estimates of such complex affinity landscapes is, thus, essential to the control of gene expression and the advance of synthetic biology. Results: Here, we propose a two-round prediction method that is based on support vector regression (SVR) with weighted degree (WD) kernels. In the first round, a WD kernel with shifts and mismatches is used with SVR to detect the importance of subsequences with different lengths at different positions. The subsequences identified as important in the first round are then fed into a second WD kernel to fit the experimentally measured affinities. To our knowledge, this is the first attempt to increase the accuracy of the affinity prediction by applying two rounds of string kernels and by identifying a small number of crucial k-mers. The proposed method was tested by predicting the binding affinity landscape of Gcn4p in Saccharomyces cerevisiae using datasets from HiTS-FLIP. Our method explicitly identified important subsequences and showed significant performance improvements when compared with other state-of-the-art methods. Based on the identified important subsequences, we discovered two surprisingly stable 10-mers and one sensitive 10-mer which were not reported before. Further test on four other TFs in S. cerevisiae demonstrated the generality of our method. Conclusion: We proposed in this paper a two-round method to quantitatively model the DNA binding affinity landscape. Since the ability to modify genetic parts to fine-tune gene expression rates is crucial to the design of biological systems, such a tool may play an important role in the success of synthetic biology going forward.
机译:背景:对转录因子(TF)及其DNA结合位点之间相互作用的定量理解是合理设计基因调控网络的关键。高通量技术的最新进展使蛋白质-DNA结合亲和力的高分辨率测量成为可能。重要的是,这些实验揭示了TF-DNA相互作用的复杂性质,由此观察到核苷酸变化对结合亲和力的影响是上下文相关的。因此,对这类复杂亲和力景观进行高质量估计的系统方法对于控制基因表达和合成生物学的进步至关重要。结果:在这里,我们提出了一种基于加权向量(WD)核的支持向量回归(SVR)的两轮预测方法。在第一轮中,带有移位和不匹配的WD内核与SVR一起使用,以检测在不同位置具有不同长度的子序列的重要性。然后将在第一轮中确定为重要的子序列送入第二个WD内核,以适合实验测量的亲和力。据我们所知,这是通过应用两轮字符串核并识别少量关键k-mers来提高亲和力预测准确性的首次尝试。通过使用来自HiTS-FLIP的数据集预测啤酒酵母中Gcn4p的结合亲和力景观来测试提出的方法。与其他最新方法相比,我们的方法明确识别了重要的子序列,并显示出显着的性能改进。基于确定的重要子序列,我们发现了两个之前从未报道过的稳定的10-mer和一个敏感的10-mer。对啤酒酵母中其他四个TF的进一步测试证明了我们方法的普遍性。结论:我们在本文中提出了一种两轮方法来定量建模DNA结合亲和力格局。由于修饰基因部分以微调基因表达速率的能力对于生物系统的设计至关重要,因此这种工具可能在合成生物学的成功发展中起重要作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号